July 6, 2019

Preview

  1. Introduction to single-cell RNA-seq
  2. Quality control and normalization
  3. Survey of downstream analysis methodology

Highly variable genes

  • Goal: identify a set of genes with high variability attributable to biology (over and above technical variability)
  • Useful for vizualization, dimension reduction, clustering, marker gene selection, etc
  • Challenging without orthogonal measurement of technical variability (e.g. spikeins)
    • with spikeins: select genes with variance significantly above mean-variance trend in control genes
    • without spikeins: select genes with variance significantly above overall mean-variance trend in all endogenous genes (assumes variance of most genes is purely technical)

Highly variable genes with spikeins

Highly variable genes without spikeins

Dimension reduction

  • Useful to summarize & visualize relationships between cells in low dimensional space
  • Commonly used approaches:
    • PCA (Principal Components Analysis)
    • tSNE (t-distributed Stochastic Neighbor Embedding)
    • UMAP (Uniform Manifold Approximation and Projection)
  • Clustering can be carried out on reduced dimensions, but with caution

PCA vs Nonlinear methods

  • PCA attempts to extract the largest components of variation in the data
  • Nonlinear methods such as tSNE and UMAP attempt to map points to a global coordinate system that preserves local structure
    • density & distance of points not preserved
    • better at visualizing rare subtypes

image source: http://carbonandsilicon.net/rblogging/2018/02/27/UMAP_plots

Clustering

Differential expression

  • Traditional differential expression for bulk RNA-seq aims at detecting shifts in mean (fold change)
  • Typically read counts are modeled with a two parameter distribution (mean, dispersion)

Bulk RNA-seq measures averages

Heterogeneity revealed by single-cell

scdd

scDD detects more subtle and complex changes

DE after clustering

R tools for downstream analysis

  • scater: visualization, quality control (Bioconductor)
  • scran: normalization, doublet detection, batch effect correction (Bioconductor)
  • SCnorm: normalization (Bioconductor)
  • sctransform: normalization (CRAN)
  • DropletUtils: removal of empty droplets (Bioconductor)
  • Seurat: normalization (CRAN)

drawingdrawingdrawing

There are many more tools I didn’t mention…

Growing number of computational tools

Curated list of tools from Sean